Video-Engine-Starter

The problem

Building programmatic video by async-waiting inside React components is a recipe for broken layouts. React is built for rendering UI state, not coordinating multi-second media buffers. If you attempt to fetch speech tracks or calculate text animations on the fly inside Remotion compositions, you trigger timing errors. When the video renders, overlays drift out of sync with the audio track.

Programmatic video requires a **Temporal Authority**. Programmatic timing decisions must be made before React starts compiling frames. The Node.js orchestrator must generate the assets, query their durations in milliseconds using CLI utilities, convert those durations into integers based on your target framerate, and feed those integers into React props. React acts as a flat renderer, reading numbers without doing timing math.

How it works: step by step

Step 1: Brief-to-Script. The user submits a text brief. A script agent (classifying tasks via the fallback router) writes a structured JSON document: a hook sentence, five narrative segments, visual hints, and a Call-To-Action (CTA).
Step 2: Speech Generation. The segment texts are parsed and sent to Edge TTS. Edge TTS generates clean, human-like voice recordings as MP3 files under 60 seconds at zero cost.
Step 3: Temporal Frame Math. The orchestrator queries the voice MP3 files using ffprobe to read the exact audio duration down to the millisecond. It multiplies the seconds by the target framerate (e.g. 30 fps) and rounds up to get an integer frame duration.
Step 4: Image Collection. Visual cues are extracted from the script, and corresponding scenes are fetched from Pollinations.ai or stable-diffusion endpoints to compile the visual layer.
Step 5: Remotion Build. The frame durations are passed as composition props to Remotion. Remotion mounts the components, synchronizes the voice track, overlays text captions, and renders a 1080x1920 MP4 file.

Interactive: Temporal Clock Calculator

Simulate the Node.js ffprobe step calculating the exact Remotion durationInFrames prop from a generated audio file.

Simulated ffprobe Output

Audio Length (seconds): Target Video FPS:

Remotion Injection Props

--

                            Waiting for input...
                        

Programmatic Media Tooling

The pipeline targets free developer keys and APIs to allow rendering without runtime costs:

Step	API / Engine	Price
LLM Scripting	NVIDIA NIM Developer Console → Google Gemini Free API	$0.00 (Free tiers)
Text-to-Speech	Edge TTS (Reverse engineered Microsoft speech endpoint)	$0.00 (No key required)
Image Generation	Pollinations.ai (Flux & SDXL wrappers)	$0.00 (No key required)
Video Compilation	Remotion CLI + FFmpeg + ffprobe package	$0.00 (Local compiler)

File Architecture

pipeline/run.mjs: The coordinator. Calls script, voice, and visual steps, writes output data files, and fires the Remotion compiler.
pipeline/llm-router.mjs: RESILIENCY. Fallback router that skips down the line of API keys if endpoints fail or are rate-limited.
pipeline/temporal-authority.mjs: Executes the ffprobe child process to read audio length and convert to frame integers.
src/Root.tsx: Declares composition structures and coordinates frame props inside Remotion.

How to run it

git clone https://github.com/shubham0086/video-engine-starter
cd video-engine-starter
npm install
cp .env.example .env

# Compile a video on Deep Sleep
node pipeline/run.mjs "The science of deep sleep"

# Open Remotion Studio to preview
npx remotion studio

# Compile video file to MP4
npx remotion render BasicReel out/deep-sleep.mp4

Where this fits

Video-Engine-Starter is the **programmatic media output** layer. It acts as Stage 4 of the Agentic OS video automation pipeline:
Brief Intake → Research → Outlines → [Video Engine Starter] → Publishing QA → Distribution

Honest framing

This is a developer starter kit, not a plug-and-play SaaS system. It renders basic slides, text layouts, captions, and static images. If you require advanced features like keyframe animations, audio filters, custom transitions, or multi-track audio layering, you will need to write custom React Remotion templates.プログラムによる動画構築のための基盤です。

Video Engine Pipeline : Overview

Programmatic Media Pipeline : Architecture Diagram